Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review #2412

Closed
wants to merge 80 commits into from
Closed

Review #2412

wants to merge 80 commits into from

Conversation

atobiszei
Copy link
Collaborator

No description provided.

dkalinowski and others added 30 commits March 6, 2024 11:55
- fix 404s due to openvino link structure change
- 2023.3 -> 2024 where neccessary
- spelling fixes
CVS-135106
---------

Co-authored-by: Trawinski, Dariusz <dariusz.trawinski@intel.com>
* validate class and execute method existence, extend pyovms.Tensor constructor, fix finalize not called issue, print with flush in demos
Fixed bugs in capi benchmark app, documented and created demo showcasing benchmark app features
---------

Co-authored-by: Trawinski, Dariusz <dariusz.trawinski@intel.com>
* Resolve python node todo's
smart building depending on the content
parallel tests execution
build performance optimization
By having a verbose flag, it creates ~67,000 lines of messages in the build
logs just for unpacking the boost tar file. This makes it challenging to
audit the build process.
* Allow flag injection to pugixml

This commit contains a patch that adds the variables for the CXX and linker
flags to the CMakeLists.txt file. It then uses the patch during build so that
later we can inject build flags on the cmake command.

* exclude header check
* fix dockerfile sequence
* set ubi as the default base image

---------

Co-authored-by: Steve Grubb <ausearch.1@gmail.com>
* Add string output demo
* Add support of _contents fields in KServe request input for mediapipe for all deserialization paths

---------

Co-authored-by: atobisze <adrian.tobiszewski@intel.com>
* Fixing references

* Fix internal link
rasapala and others added 29 commits May 6, 2024 14:47
* universal_and_benchmark_documentation_updates

* no proxy update

* update benchmark proxy

* add version to ubuntu tag

* revert ubuntu changes

* added localhost

* review
* dockerfile for gradio
* monitoring changes in the documents scope
* preinstall nltk modules
* default security context set to ovms account
* improvements in rag demo
CVS-138032

Implementation of /v3/chat/completions endpoint and forwarding the HTTP message to MediaPipe graph.
The data is std::string now, to be adjusted in following tasks (CVS-139240/CVS-140684).
* CVS-137992_fix_deadline_exceeded_dg2

* add retry for get_model_metadata_request

* add get_model_metadata function

* fix test names

* increase timeout for GetModelStatus
https://jira.devtools.intel.com/browse/CVS-139240
Implementation of chat completion request conversion to HttpPayload struct.
* Fix ovms status to http status conversion
* add-version-to-ubuntu-os

* fix ovms_pkg link

* BASE_OS_DISTRO

* ovms_pkg os

* updates

* DIST_OS added

* adjust nginx build

* fix nginx

* Update Makefile

Co-authored-by: ngrozae <104074686+ngrozae@users.noreply.github.com>

* Update Makefile

Co-authored-by: ngrozae <104074686+ngrozae@users.noreply.github.com>

---------

Co-authored-by: ngrozae <104074686+ngrozae@users.noreply.github.com>
CVS-139231/CVS-139233

This introduces LLM calculator that accepts HTTP OpenAI /v3/chat/completion requests and produces compliant responses.
Working in both - unary and streaming modes.
Bunch of parameters are still marked as TODO, but should be enough to perform benchmarks.

Minimal demo description how to run.
* Add scheduler config in graph options
* Fix centos stream-8
---------

Co-authored-by: Miłosz Żeglarski <milosz.zeglarski@intel.com>
Co-authored-by: ngrozae <104074686+ngrozae@users.noreply.github.com>
CVS-142768

Forwards beam search and multinomial sampling parameters to CB library - this enables returning more than 1 completion for beam search (only for unary)
Adds profiling traces (minitrace)
* Add UTs for llm request conversion
* fix tbb handling for ubuntu20
There is an issue (or feature?) that adding generated token to the token cache produces shorter message than previous without newly generated one.

TextStreamer did not expect such behavior.

The fix ignores such event and makes the generation wait for the next tokens.

+ reducing number of response chunks by adding requirement so that chunk needs to include space in order to send cache to the client
@atobiszei atobiszei closed this Jun 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet